Search CORE

58 research outputs found

Recoverable One-dimensional Encoding of Three-dimensional Protein Structures

Author: A. R. Kinjo
Berman
Havel
K. Nishikawa
Kabsch
Kinjo
Nakai
Plaxco
Porto
Vendruscolo
Wang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2005
Field of study

Protein one-dimensional (1D) structures such as secondary structure and contact number provide intuitive pictures to understand how the native three-dimensional (3D) structure of a protein is encoded in the amino acid sequence. However, it has not been clear whether a given set of 1D structures contains sufficient information for recovering the underlying 3D structure. Here we show that the 3D structure of a protein can be recovered from a set of three types of 1D structures, namely, secondary structure, contact number and residue-wise contact order which is introduced here for the first time. Using simulated annealing molecular dynamics simulations, the structures satisfying the given native 1D structural restraints were sought for 16 proteins of various structural classes and of sizes ranging from 56 to 146 residues. By selecting the structures best satisfying the restraints, all the proteins showed a coordinate RMS deviation of less than 4\AA{} from the native structure, and for most of them, the deviation was even less than 2\AA{}. The present result opens a new possibility to protein structure prediction and our understanding of the sequence-structure relationship.Comment: Corrected title. No Change In Content

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the optimal contact potential of proteins

Author: Akira R. Kinjo
Anfinsen
Bastolla
Cao
Cao
Cheng
Chikenji
Fleming
Graña
Gō
Hao
Horn
Horn
Ishida
Kabakçiogˇlu
Kinjo
Kinjo
Kinjo
Kinjo
Li
Mitsutake
Miyazawa
Miyazawa
Munson
Porto
Sanzo Miyazawa
Takada
Taketomi
Vendruscolo
Vendruscolo
Vendruscolo
Vullo
Yuan
Publication venue: 'Elsevier BV'
Publication date: 02/01/2008
Field of study

We analytically derive the lower bound of the total conformational energy of a protein structure by assuming that the total conformational energy is well approximated by the sum of sequence-dependent pairwise contact energies. The condition for the native structure achieving the lower bound leads to the contact energy matrix that is a scalar multiple of the native contact matrix, i.e., the so-called Go potential. We also derive spectral relations between contact matrix and energy matrix, and approximations related to one-dimensional protein structures. Implications for protein structure prediction are discussed.Comment: 5 pages, text onl

arXiv.org e-Print Archive

Crossref

Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families

Author: Aiman Soliman
Anand Padmanabhan
Junjun Yin
Kiumars Soltani
Shaowen Wang
Publication venue
Publication date: 01/01/2018
Field of study

In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.Comment: 13 pages, 7 figures, 2 tables (a new subsection added

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

FigShare

Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks

Author: Altschul S. F., Madden, T. L., Sch
Baldi P., Brunak, S., Frasconi, P.
CHANDONIA J-M
Crooks G. E. &amp
Kinjo A. R. &amp
Kinjo A. R. &amp
Kinjo A. R., Horimoto, K. &amp
Lee B. &amp
Li W., Jaroszewski, L. &amp
Nishikawa K. &amp
Pollastri G., Baldi, P., Fariselli
Rost B.
TATENO Y
Publication venue: 'Biophysical Society of Japan'
Publication date: 01/01/2005
Field of study

Prediction of one-dimensional protein structures such as secondary structures and contact numbers is useful for the three-dimensional structure prediction and important for the understanding of sequence-structure relationship. Here we present a new machine-learning method, critical random networks (CRNs), for predicting one-dimensional structures, and apply it, with position-specific scoring matrices, to the prediction of secondary structures (SS), contact numbers (CN), and residue-wise contact orders (RWCO). The present method achieves, on average,

Q_3

accuracy of 77.8% for SS, correlation coefficients of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS prediction is comparable to other state-of-the-art methods, and that of the CN prediction is a significant improvement over previous methods. We give a detailed formulation of critical random networks-based prediction scheme, and examine the context-dependence of prediction accuracies. In order to study the nonlinear and multi-body effects, we compare the CRNs-based method with a purely linear method based on position-specific scoring matrices. Although not superior to the CRNs-based method, the surprisingly good accuracy achieved by the linear method highlights the difficulty in extracting structural features of higher order from amino acid sequence beyond that provided by the position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for publication in BIOPHYSIC

arXiv.org e-Print Archive

Crossref

Properties of contact matrices induced by pairwise interactions in proteins

Author: A. G. Murzin
Akira R. Kinjo
B. Bollobás
K. Nishikawa
K. Nishikawa
R. A. Horn
Sanzo Miyazawa
Publication venue: 'American Physical Society (APS)'
Publication date: 31/08/2011
Field of study

The total conformational energy is assumed to consist of pairwise interaction energies between atoms or residues, each of which is expressed as a product of a conformation-dependent function (an element of a contact matrix, C-matrix) and a sequence-dependent energy parameter (an element of a contact energy matrix, E-matrix). Such pairwise interactions in proteins force native C-matrices to be in a relationship as if the interactions are a Go-like potential [N. Go, Annu. Rev. Biophys. Bioeng. 12. 183 (1983)] for the native C-matrix, because the lowest bound of the total energy function is equal to the total energy of the native conformation interacting in a Go-like pairwise potential. This relationship between C- and E-matrices corresponds to (a) a parallel relationship between the eigenvectors of the C- and E-matrices and a linear relationship between their eigenvalues, and (b) a parallel relationship between a contact number vector and the principal eigenvectors of the C- and E-matrices; the E-matrix is expanded in a series of eigenspaces with an additional constant term, which corresponds to a threshold of contact energy that approximately separates native contacts from non-native ones. These relationships are confirmed in 182 representatives from each family of the SCOP database by examining inner products between the principal eigenvector of the C-matrix, that of the E-matrix evaluated with a statistical contact potential, and a contact number vector. In addition, the spectral representation of C- and E-matrices reveals that pairwise residue-residue interactions, which depends only on the types of interacting amino acids but not on other residues in a protein, are insufficient and other interactions including residue connectivities and steric hindrance are needed to make native structures the unique lowest energy conformations.Comment: Errata in DOI:10.1103/PhysRevE.77.051910 has been corrected in the present versio

arXiv.org e-Print Archive

Crossref

CRNPRED: highly accurate prediction of one-dimensional protein structures by large-scale critical random networks

Author: Kinjo Akira R
Nishikawa Ken
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: One-dimensional protein structures such as secondary structures or contact numbers are useful for three-dimensional structure prediction and helpful for intuitive understanding of the sequence-structure relationship. Accurate prediction methods will serve as a basis for these and other purposes. RESULTS: We implemented a program CRNPRED which predicts secondary structures, contact numbers and residue-wise contact orders. This program is based on a novel machine learning scheme called critical random networks. Unlike most conventional one-dimensional structure prediction methods which are based on local windows of an amino acid sequence, CRNPRED takes into account the whole sequence. CRNPRED achieves, on average per chain, Q(3 )= 81% for secondary structure prediction, and correlation coefficients of 0.75 and 0.61 for contact number and residue-wise contact order predictions, respectively. CONCLUSION: CRNPRED will be a useful tool for computational as well as experimental biologists who need accurate one-dimensional protein structure predictions

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting residue-wise contact orders in proteins by support vector regression

Author: A Bairoch
AG Murzin
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
B Rost
CH Tsai
D Kihara
D Sarda
DT Jones
G Pollastri
G Pollastri
GP Raghava
HM Berman
J Song
J Wang
Jiangning Song
JM Chandonia
Kevin Burrage
KW Plaxco
M Punta
MPS Brown
NP Prabhu
S Ahmad
S Hua
S Hua
V Vapnik
V Vapnik
W Kabsch
W Liu
X Wang
Z Yuan
Z Yuan
Z Yuan
Z Yuan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. RESULTS: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. CONCLUSION: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Nature of protein family signatures: Insights from singular value analysis of position-specific scoring matrices

Author: A Bundi
A Kidera
AG Murzin
Akira R. Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
AR Knjo
B Qian
B Rost
BE Suzek
C Barber
C Rosano
D Bashford
David Jones
DT Jones
DT Jones
F Beghin
FM Richards
G Wang
Haruki Nakamura
HM Berman
J Kyte
JL Fauchère
JO Wrabl
JT Lecomte
JU Bowie
JU Bowie
K Nakai
K Nishikawa
K Nishikawa
K Tomii
M Charton
M Gribskov
M Kann
M Levitt
M Oobatake
M Ota
M Ota
M Porto
MG Rudolph
MO Dayhoff
P Klein
P Koehl
P Pokarowski
PHA Sneath
R Aurora
R Durbin
R Grantham
RA Horn
RD Finn
RF Doolittle
RM Sweet
S Fukuchi
S Henikoff
S Kawashima
S Miyazawa
SF Altschul
SF Altschul
SR Eddy
T Ishida
TM Cover
U Bastolla
WE Royer Jr
WR Taylor
Z Yuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/11/2007
Field of study

Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. The presented method may be used to separate functionally important sites from structurally important ones, and thus it may be a useful tool for predicting protein functions.Comment: 22 pages, 7 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Better prediction of protein contact number using a support vector regression analysis of amino acid sequence

Author: Yuan Zheng
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C(β )atoms in other residues within a sphere around the C(β )atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. RESULTS: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. CONCLUSION: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Protein contact order prediction from primary sequences

Author: Bmc Bioinformatics
David Arndt
David S Wishart
Guohui Lin
Jianjun Zhou
Yi Shi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central